The ImageGear Recognition component provides access to document recognition technology.
This core software module provides recognition for almost any document, including those produced on typewriters, dot-matrix printers, ink-jet printers, laser printers, and phototypesetters, as well as photocopied and faxed versions of any document.
Using the ImageGear Recognition component, you can recognize a document image and export the text in one of the supported output formats. These formats include 8-bit ASII and 16-bit Unicode text, Microsoft Word, HTML, and others.
The ImageGear Recognition Component also allows a user to:
- Manage delineated zones of a document page and then specify treatment for those zones. This includes the ability to correct the OCR engine's automatic segmentation between the segmentation phase and the recognition phase.
- Process both text and graphics. The recognition software's ability to distinguish graphics from text can provide the basis of a compound document processing system.
- Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
- Recognize Chinese Simplified, Chinese Traditional, Japanese, and Korean languages.
Portions of this document are excerpted from OmniPage Capture SDK 18.6 reference materials. Copyright © 2011 Nuance Communications, Inc. All Rights Reserved. Nuance and the Nuance logo are trademarks or registered trademarks of Nuance Communications, Inc. or its affiliates in the United States and/or other countries. |
The figure below illustrates a simple application that takes a page image as input and produces recognized text in a variety of output formats.
This section provides information about the following:
The very first working codeThe code example below demonstrates loading and recognizing the TEST1.TIF image file. The image contains English, machine-printed text. The result will be saved in the TEST1.TXT text file.
C | Copy Code |
---|---|
#include <windows.h> // Windows includes #include "gear.h" // Include for Accusoft ImageGear #include "i_rec.h" // Include for ImageGear Recognition component //Loading and recognizing the Image.tif image file. The result will be saved in the Image.txt text file. VOID Using_VeryFirst() { AT_ERRCOUNT nErrCount; HIGEAR hIGear; HIG_REC_IMAGE hImg; nErrCount = IG_comm_comp_attach("REC"); nErrCount = IG_REC_initialize(); nErrCount = IG_load_file("Image.tif", &hIGear ); nErrCount = IG_REC_image_import(hIGear, &hImg); nErrCount = IG_image_delete(hIGear); nErrCount = IG_REC_output_codepage_set("Windows ANSI"); nErrCount = IG_REC_output_text_format_set(IG_REC_DTXT_TXTS); nErrCount = IG_REC_image_recognize(hImg); nErrCount = IG_REC_output_direct_text_write(&hImg, 1, "Image.txt"); nErrCount = IG_REC_image_delete(hImg); nErrCount = IG_REC_close(); } |